Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders
Yin-Jyun Luo$^{1, 3}$, Chin-Cheng Hsu$^{2}$, Kat Agres$^{3,4}$, Dorien Herremans$^{1,3}$
$^{1}$Singapore University of Technology and Design
$^{2}$University of Southern California
$^{3}$Institute of High Performance Computing, A*STAR, Singapore
$^{4}$Yong Siew Toh Conservatory of Music, National University of Singapore
$\tt yinjyun\_luo@mymail.sutd.edu.sg$
Many-to-Many Singer and Singing Technique Conversion
The following are the audio samples of Fig. 2 in the paper.
The audio files below are all converted from Mel-spectrograms using Griffin-Lim. Therefore, the waveforms "original Mel-spectrogram" are the upper bounds of the audio quality for each conversion.